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ABSTRACT 
We  compare  here  two  schemes  of  accessing  a  single  critical 
resource  in  a  multiprocessor  system:  A  buffered  tree  network  (with  the 
processors  being  the  leaves)  and  an  unbuffered  netxvork  (where  message 
retransmissions  are  used  for  aborted  messages  due  to  conflicts).  We 
prove  that  the  two  schemes  have  comparable  perforrance.  This  should  be 
contrasted  with  the  case  of  many  critical  resources  and  banyan  networks 
(in  which  case,  the  buffered  networks  are  performing  essentially 
better).  We  also  discuss  the  related  problem  of  accessing  a  single 
critical  resource  in  the  P-RA]-1  model  of  parallel  computation,  by  using 
N  processors. 
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1.  The  problem 
A  critical  resource  CR  is  assumed  to  be  connected  to  the  root  node 
of  a  coinplete  binary  tree  network  of  N  leaves  and  2N-1  nodes.  Let  the 
leaves  of  the  tree  be  caJ  led  £j,...,£j^.  N  packets  (pj,...,pj^)  are  at 
time  0  located  at  the  leaves,  packet  p.  at  leaf  £^(i  =  1,...,N).  Let 
the  root  be  at  level  h  =  log  N  and  the  leaves  at  level  0.  At  each  time 
unit,  all  packets  of  level  m  are  transmitted  to  level  ra  +  1.  (m  =0, 
l,...,h-l).  We  want  to  compare  the  performance  of  two  protocols:  In 
protocol  A,  the  nodes  of  the  tree  are  assumed  to  hold  buffers  of  size 
2  for  nodes  of  level  i.  In  protocol  B,  the  nodes  are  unbuffered.  The 
performance  measure  will  be  the  number  of  time  units  that  the  N  packets 
p,  , . . .  ,Pj^  need    so  that   each  of   them  accesses   the   critical   resource. 

Protocol   A;      (For   each  node) 

During   each  time  unit: 

(1)  If    the  node's    queue  is    nonempty,    then   the   node  sends   one    of   the 
packets    (taken   from   the   queue)   up  one    level. 

(2)  Receive   packets    (if  any)   from  lower  level. 

(3)  Queue   the  packets    received. 

Protocol   B;      (For   each  node) 

During    each   time  unit: 

(1)  If    the  node  holds  a    packet,    it   sends    the   packet   up  one    level 

(2)  Receive    packets    from  lower   level    (if   any) 

(3)  If    there  are   more    than   one   packets    received,    then   abort 
all  but  one    of    them. 

Here     we   assume    that    the  aborted    packets    are    (ideally)    resubmitted 
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at  their  orip.in  nodes,  ImmedJ.ateJ.y  after  abortion.  In  practice,  either 
the  netu'ork  has  to  notify  the  origin  nodes  that  their  packets  were 
aborted  (in  which  case  we  need  a  buffered  scheme  for  the  notifications) 
or  the  origin  nodes  time-out  the  packets  sent.  (Here,  it  is  usualiy 
the  case  that  the  packets  which  access  the  critical  resource  can  travel 
the  tree  backwards,  without  conflicting  v;ith  up-going  packets).  If 
packet  p.  does  not  return  after  2  log  N  time  units,  then  the  leaf  node 
tj  resubmits  it.  We  shall  provide  analyses  of  both  the  ideal  protocol 
B  and    its    implementation  using   timeouts. 


2.     Analysis    of   the   buffered  case 
Let  T^(N)  be   the  number  of   time  units    needed   for  all  N  packets      to 
arrive  to   the   critical   resource,   passing   through  a   tree   of  height  h. 
Then 


T^(N)    =   1    (for  the    first  time  unit) 

+  T^_i(N/2)    (for   the  N/2  winners   to  go) 
+  tJ_i(N/2)    (for   the  N/2   loosers   to  go) 
i.e.      tJ(N)    =    1  +    2T^_i(N/2)  (*) 


(with     T^(i)    =   1) 

(*)    implies      Xj^g   j^(N)    =   l^""^  ^+1    -    1    =  2N-1 

Hence , 
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CoroJJary      1      It      takes      2N-1      time      units      for    the    buffered    network   to 
process   ali  N  packets. 


3.      Analysis    of    the   unbuffered  case 
3. a.      Analysis    assuming    instantaneous    resubmission. 
Let  T^(N)   be    the  number  of   time  units    necessary   for   all  N      packets 
to  appear   at   the   tree's    root,    when   the    tree    is    of   height  h. 

"^log     N^^^   ^^    ^°'-  worse   than   the    time   Tj^^     N^^^   ^-^    takes   for  all  N 
packets   to  reach  the  root,   under   the    following  modified   protocol: 

(1)  The  packets   which   come  from  the   left   subtree   of   the   root  are  always 
aborted  when  compete  with   packets    of   the  right  subtree. 

(2)  The  packets  of   the   left   subtree   are   not   resubmitted   instantaneously. 
Instead,    they   wait   for  all  the   packets    of   the   right  subtree   of   the  root  to 
be  processed.      Then  they  are  all   (slmultaneouly)   submitted   again. 

We      follow     this      protocol      (recursively)     within      each        subtree. 
Clearly    then 

T^(N)    =  sura   of 

(1)  1  +  time   needed    for  all  packets    of   the   right  subtree   of   the   root  to 
access   the  right   child,    B,    of   the   root    (See  Fig.    1). 

(2)  1   +  time    needed   for  all  packets    of   the   left   subtree   to  access   the   left 
child   of   the   root,    A. 

So 


tJ(N)    =2+2   tJ_i(N/2) 
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implying    (again    assuminR  T(^(l)    =    1) 


T*^     n(N)    =   2   +  2^  +...+   2l«K  ^  +  2J-OS  N 


i.e. 


^loa  n(N)    =   3  N-2 


Hence 


'^log  n(N)  <    3N-2 


So 


Corollary  2^  It  takes  at  most  3N-2  time  units  for  the  unbuffered  network 
to  process  all  N  packets, 

3.b.      The  case  with   timeouts. 
If  T,        i^(^)  is    the  time   needed    for  all  N  packets    to  appear   at   the 
root   of   the   tree    of  height   log  N,    by   using   timeouts,    then  we   note   that 

■^logN^N)    =TV^gN(N)    +  2  logN 

(since  the  resubmission  times  in  the  case  of  timeouts  are  produced  from 
the  resubmission  times  in  the  ideal  case  by  a  single  shift  of  the  time 
axis  by  2  log  N).   Hence, 


■^log  n(N)  <  3N  +  2  log  N-2 
Also,  since 
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"^lop.  N<^"^    ^    ^'        ^'^    conciutle    Lha  t        Tj^^^  ^(hl)   >    U  +    2    log  N 


Hence 


Corollary  3^  It  takes  at  most  3N  +  2  log  N-2  time  units  (and  at  least  N 
+  2  log  N)  for  the  unbuffered  network  with  timeouts  to  process  all  N 
packets. 


4.      Conclusions 

The  time  needed  for  all  N  packets  to  access  the  critical  resource 
is  of  the  same  order  of  magnitude  in  both  the  unbuffered  and  buffered 
network   cases.      In   both  cases   the   time   is   9 (N)  and 
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Anpendlx 

It  is  interesting,  to  consider  the  corresponding  problem  in  the 
P-RAM  model  of  parallel  computation.  Here,  we  are  given  N  processors, 
and  a  critical  memory  location,  M,  into  the  global  memory.  We  want  to 
estL-nate  the  parallel  time  needed  so  that  all  N  processors  access  M. 
The  obvious  way  to  solve  the  problem  is  to  sort  the  processors  by  id 
number  and  use  the  sorted  order  for  accessing  M.  The  parallel  time 
needed  is  N  +  O(log^N). 

One  way  to  lower  the  o(N)  term,  is  to  use  randomization.  Consider 
the  following  algorithm,  which  proceeds  in  phases;  During  each  phase 
all  processors  draw  independently  a  random  Integer  out  of  the  set 
{1,...,N}. 

Those  who  are  assigned  the  value  N  are  considered  to  be  "winners". 
They  count  themselves.  If  a  unique  winner  exists,  then  he  accesses  the 
resource  M,  else  no  processor  accesser  M  and  they  proceed  in  the  next 
phase.   The  probability  of  a  unique  winner  in  each  phase  is 


id  -  i)N-l  =  J_ 

N     N       eN 


So,  the  average  number  of  phases  is  eN  i.e.  about  2.73  N.  A  phase  may 
take  as  much  as  O(iog  N)  due  to  the  counting  of  winners.  However,  the 
average  number  of  winners  per  phase  is  constant.  We  conclude  that  the 
average   parallel   time  of  this   scheme   is   eN. 

We     don't     know     of      any     deterministic     parallel     algorithm  on  an 
N-processor  P-RAM  which   outperforms    the  above   probabilistic  technique. 
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